Unit 4: Data Cleanse Transforms 
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Figure 20: Business Need for Data Cleanse Transform 2 





Strategies for Data Cleansing 


How you configure your Data Cleanse transforms depends on the type of data you are 
cleansing. 


Name, title, firm, and firm location data 


You can standardize name data and generate discrete standardized fields for prename, first 
name, middle name, last name, maturity post name, and honorary post name based on which 
field you decide to evaluate to determine if two records match. 


For the first name and middle name match standards, you can generate up to six first name 
match standards and up to six middle name match standards. Even though there are a 
maximum of six first and middle match name standards, you can only use a maximum of 
three first name match name standards and a maximum of three middle name match name 
standards when matching. 


The Data Cleanse transform also parses up to six job titles per record, up to two firm names 
(such as IBM), and up to two firm locations (such as Engineering Department). This transform 
can also convert firm names to accepted acronyms, such as General Motors to GM. 


Social Security number data 


Data Cleanse parses US Social Security numbers (SSN) that are either by themselves or on 
an input line surrounded by other text. Data Cleanse outputs the individual components of a 
parsed Social Security number: the entire SSN, the area, the group, and the serial. 


Data Cleanse parses Social Security numbers in two steps. First, it identifies a potential SSN 
by looking for any of three patterns. Once the pattern is identified, Data Cleanse performs a 
validity check on the first five digits only. If the number fails validation, the number is not 
output, as it is not considered a valid SSN as defined by the US government. 


Email data 


When Data Cleanse parses input data it recognizes as an e-mail address, it outputs the 
individual components of a parsed address: the email user name, complete domain name, top 
domain, second domain, third domain, fourth domain, fifth domain, and host name. 


You can also verify that an e-mail address is properly formatted and flag the address as 
belonging to an internet service provider (ISP). Data Cleanse does not verify whether the 
domain name is registered. Nor does it verify that an e-mail server is active at that address, 
the user name Is registered on that e-mail server, or that the personal name in the record can 
be reached at this e-mail address. 
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